Removal of Spectro-Polarimetric Fringes by 2D Pattern Recognition 

R. Casini, 1 P. G. Judge, 1 T. A. Schad, 2 

1 High Altitude Observatory, NCAR, X P. O. Box 3000, Boulder, CO 80307-3000, U.S.A. 
2 Lunar and Planetary Lab, University of Arizona, Tucson, AZ 85721, U.S.A. 



3 



Of 



ABSTRACT 

We present a pattern-recognition based approach to the problem of removal of po- 
larized fringes from spectro-polarimetric data. We demonstrate that 2D Principal Com- 
ponent Analysis can be trained on a given spectro-polarimetric map in order to identify 
and isolate fringe structures from the spectra. This allows us in principle to reconstruct 
the data without the fringe component, providing an effective and clean solution to the 
problem. The results presented in this paper point in the direction of revising the way 
that science and calibration data should be planned for a typical spectro-polarimetric 
observing run. 



1. Introduction 

The observation and interpretation of wavelength dependent polarization signals in spectral 
lines is the primary method for the diagnostics of anisotropic processes in astrophys ical plasmas , 
such as those induced by the presence of deterministic electric and magnetic fields (e.g., lstennoll994l : 



Landi DeglTnnocenti & Landolfil 120041 ; ICasini k Landi DegPInnocentil 120081 ; iTruiillo Buenoll20ld ). 
or by plasma collisions with collimated beams of ions (e.g.. lFujimotdl2008l ). At the same time, the 
depolarizing effects by isotropic collisions and by quasi-random electro-magnetic fields can yield in- 
formation on the density of the plasma constituents, as well as importa nt insights on the overall com- 
plexi ty of turbulent plasmas at diverse spatial and temporal scales (e.g.. lCasini. Manso Sainz. Low 
2009T ). 



The amplitudes of polarization signals observed in astrophysical plasmas vary widely, ranging 
from the very small signatures (< 10~ 3 /, where / is the radiation in tensity) typical of the weak-field 
modi fications of scattering polarization by the Hanle effect (see, e.g-. lLandi Degl'Innocenti Landolfi 
2004 ). to large amplitudes (> 10 _1 /) induced by the Zeeman effect in the presence of strong mag- 
netic fields, such as those found in sunspots. The weaker polarization signals are easily swamped 
by systematic errors associated with instrumental effects, which are often difficult to model to the 
level of precision (polarization accuracy) that is needed in order to isolate the true signals com- 
ing from the observed physical system. The calibration of this instrumentally induced polarization 



lr The National Center for Atmospheric Research (NCAR) is sponsored by the National Science Foundation. 
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Fig. 1. — Example of spectro-polarimetric data, showing real spectral line polarization signatures 
superimposed to instrumental effects, including polarized fringes and detector noise. The abscissa 
is wavelength, while the ordinate is position along the instrument's slit. The spectral range includes 
the line of Si I at 1082.7 nm (around X = 50), the two components of He I at 1083 nm (around 
X = 100 and X = 130), the telluric H 2 line (around X = 180), and the Ca I doublet at 1083.3 nm 
(approximately from X = 210 and X = 230). 



is a difficult art. At the same time, the pursuit of ever finer spatial and temporal scales in the 
investigation of astrophysical plasmas puts continually growin g demands on both s ensitivity (i.e., 
signal-to-noise) and accuracy of polarimetric observations (e.g. iRimmele et al.ll2008l ). Even higher 
demands are being made by the scientifically critical need to measure chromospheric magnetic fields 
djudeelhoioh . 



Polarized fringes are commonly found in spectro-polarimetric data. These are interference 
patterns that arise because of the presence of optical components (also including air) in a spectro- 
polarimeter, which have different refractive and/or birefringent properties. Such components in- 
clude polarization modulators, polarizing beam-splitters, and any optical system where parallel 
optical interfaces may occur (e.g., interference filters, detector windows). These fringes have the 
appearance of more or less regular bidimensional patterns, often curved, which typically unfold 
along the spectral dimens ion of the data (see Fig. [1]). We refer to review studies of polarized fringes 
(|Semelll2003l ; lOarkl 12004 ) for a thorough description of this phenomenon. Here, we are interested 
exclusively with the treatment of this artifact during data reduction. 

The treatment of polarized fringes has been a recurring problem for the reduction and analysis 
of spectro-polarimetric data. The state of the art is to attempt re moval of these (and other ) 
instrumental effects using Fourier methods or wavelet analysis (e.g., iRoio k, Harringtonl 120061 ). 
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Fourier filtering has been successful at removing various types of data artifacts, when their range 
of spectral and/or spatial frequencies is clearly separated from that of the actual signal of the 
observed source (L. Kleint, private communication). Wavelet analysis attempts to identify the 
dominant freque ncies and phases of the frin ge pattern in a data frame, when the artifacts are not 



strictly periodic. iRoio fc Harrington! (|2006l ) have developed a localized solution to fringe pattern 
reconstruction by employing two-dimensional wavelet transforms, which excel at tracking smooth 
variations in phase and amplitude of a periodic signal. The shape of a fringe pattern can be isolated 
in the wavelet space of individually transformed image rows, each row corresponding to a spatial 
point in the map. An inverse wavelet transform of the fringe space then can reconstruct the fringe 
pattern in the spatial domain. This can be a particularly powerful method for removing fringes 
in flat field images, but its application to object images (especially polarized spectral images) is 
complicated by the contribution of the targeted signal to the local wavelet transform. 

Unfortunately, fringe patterns are seldom regular, having an intrinsic bidimensional structure, 
with variations of the amplitude, frequency, and phase, which can be significant in both the spectral 
and spatial dimensions (see Fig. [I]). Often fringe patterns result from the combination of more than 
one component, and this combination can also vary smoothly across the dataset. In this case, a 
two-dimensional wavelet analysis of the observations is challenged by the need to treat every frame 
separately. In this paper, we propose instead that the identification and removal of polarized fringes 
might be better approached as a problem of pattern recognition. 

Since polarized fringes typically arise within the spectro-polarimeterj^] we expect their structure 
to be predominantly a function of the instrument configuration. Let us consider a scanning slit 
instrument. During the spatial scan of a given target, the instrument configuration is approximately 
fixed. We can expect that the polarized fringes will constitute an approximately time-independent 
pattern within that particular dataset, although there may be some dependence of the fringes' 
appearance (in both amplitude and phase) on the polarization state of the light entering the spectro- 
polarimeter. As the spatial scan is acquired, the polarimetric signal of the target will then change 
over an approximately constant fringe background. Heuristically, an "orthogonality" exists between 
the true polarimetric signal that we wish to analyze and this "fixed" fringe background, as a 
consequence of the fact that the two sources of the line signals and of the background are largely 
uncorrelated. This suggests that the problem of identification and removal of polarized fringes 
should be approached as a problem of pattern recognition and feature extraction from a two- 
dimensional dataset. For this paper we d ecided to approa ch this problem using two-dimensional 



Principal Component Analysis (2D PCA; lYang et al 



20041 ) . Other methods could potentially be 



adopted instead, which also have been used for the separat ion of signals from uncor related sources, 
such as the Independent Component Analysis (ICA; e.g., Jutten k, Heraultl Il99ll ). We defer the 



study of some of these alternative methods for the problem of the identification and removal of 



2 A notable exception is represented by fringing that is caused by the polarization calibration optics, which may 
reside outside the spectro-polarimeter, often well upstream in the optical system of the telescope. 
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polarized fringes to future work. 

The simple fact that spectral line signals and polarized fringe background are largely uncor- 
rected is central to t he success of the PCA approach to the problem at hand. This concept is 



nicely summarized by I J olliffd ([2002) in the Introduction to his book: "The central idea of principal 
component analysis (PCA) is to reduce the dimensionality of a data set consisting of a large number 
of interrelated variables, while retaining as much as possible of the variation present in the data set. 
This is achieved by transforming to a new set of variables, the principal components (PCs), which 
are uncorrelated, and which are ordered so that the first few retain most of the variation present 
in all of the original variables." 

Preliminary considerations on the problematics associated with PCA filtering of polarized 
fringe s from Stokes profiles were already ady anced in a study on compressed sensing of experimental 



data (jAsensio Ramos &i Lopez Aristdl2010l ). In this work, we provide a novel and more in-depth 
investigation of the problem. In Sect. [2] we summarize the main ideas behind two-dimensional PCA, 
and show how these can be applied to the specific problem of the removal of polarized fringes from 
spectro-polarimetric data. In Sect.[3j we describe the observations from which the datasets used for 
the testing of the method were extracted. In Sect. H]we present various examples of application of 
the method to the datasets previously described, and comment on the quality and reproducibility 
of the results. Finally, in our concluding remarks, we briefly discuss possible observation and data 
calibration strategies that could help fully realize the potential of the proposed method. 



2. Two-dimensional Principal Component Analysis 



The theory behind PCA is well established ([Pearson 



1901 



Jolliffd 120021 ). PCA has been 



successfully applied to a related but different problem of the inversion of spectro-polari metric data 



for the inference of the thermodynamic and magnetic structur e of the solar atmosphere (IRees et al 



200C 



Socas-Navarro et al. 



2005; Casini et al 



2001 



Lopez Ariste &: Casinil |2002j; ICasini. Bevilacqua. &: Lopez Ariste 



20091 ) . In this application, PCA is used to identify an orthogonal set of spectral 
eigenfeatures characteristic of a given line formation model in a magnetized atmosphere. One 
then determines the principal components (projections) of the observed spectra on this orthogonal 
eigenbasis, and searches within a pre-calculated database of model profiles for the closest set of 
components to the observed set. The quality of the inversion is estimated by the "PCA distance" 
between the observed and inverted points in the PCA-component space. This distance is akin to 
the usual \ 2 estimator for a non-linear least-squares fit. In this special application, PCA then 
produces a best fit of the observed Stokes profiles as a function of wavelength. PCA is able to 
capture the structures of the observed profiles that correspond to the physical model used for the 
construction of the inversion database. Systematic errors introduced by the instrument, or by 
physical mechanisms not included in the model, cannot be captured by the PCA inversion, and 
consequently they tend to increase the inversion noise (i.e., the PCA distance). Because every 
spatial point is emitting incoherently from all the other points in the map, the spectro-polarimetric 
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inversion must be performed separately for each spatial point of the scan. 



The application of PCA to our problem is instead subject to rather different constraints. First 
of all, polarized fringes are typically very hard to model, and their appearance - such as frequency, 
phase, as well as shape - can be very specific of the particular observing and instrumental setup, 
and may change significantly between different observations. Therefore, rather than creating a 
model-based database of fringes to compare with the observations, one should extract the PCA 
eigenfeatures of the fringe pattern directly from the observed data. Another substantial difference 
is that the treatment of fringes is better done simultaneously on the entire detector frame, because 
of the intrinsic bidimensional nature of these instrumental artifacts. The correct approach should 
then be analogous to the one adopted in the application of PCA to face recognition. 

To further proceed, we nee d to choose one among t he various PCA a l gorithms that have bee n 
developed for face recognition (IKirby Sirovich I199Q; iTurk &: Pentlandl Il99ll ; lYang et al.l l20Q4j ) . 
The 2D PCA algorithm proposed by I Yang et al.l (|2004l ) appears to outperform other approaches, 
and so we selected it for our problem. We summarize here below the fundamental ideas of this 
algorithm. 

We indicate with Aj, for i = 1, . . . , N, a set of images (e.g., the sequential series of the frames 
in a spectro-polarimetric map), each represented by a m x n matrix. From these, we can create the 
n x n covariance matrix 

N 1 N 



i=l i=l 

It is demonstrated (e.g., Ijolliffdbooj ) that the optimal set of projection vectors for the decompo- 
sition of the data, for the purpose of extracting its principal components, is represented by the n 
eigenvectors of the covariance matrix. These eigenvectors can conveniently be determined by per- 
forming the singular value decomposition (SVD) of C. The result is a set of mutually orthogonal, 
n-dimensional vectors, Uj, with j = 1, ... ,n, which can be interpreted as the column vectors of 
a n x n orthogonal matrix U = {U\, . . . , U n }. It is customary to order the set of eigenvectors 
{Ui}i=i y __ in (the eigenfeatures) according to the decreasing amplitude of the associated singular 
values, (Tj, so that JJ\ corresponds to the singular value with the largest amplitude. In turn, the 
singular values can be interpreted as the weights of the corresponding eigenfeatures in their contri- 
bution to the dataset {Aj}i = i ... jf. The set {t/i}i=i ... n is complete, and thus forms a basis for the 
dataset {Aj}j=i jy. 

Once the basis {E/j}i=i n has been determined, the set of principal components (or projec- 
tions) for each of the elements of the dataset {Aj}i=i ... is 

V z = AiU, i = l,...,N. (2) 

We note that each Vj is a m x n matrix, which can be thought of as the set of m-dimensional 
column vectors, \yi,j}j=i,...,m such that 



AjC/j , 



1. 



J 



1, 



,n . 



(3) 
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Since U is an orthogonal matrix with unit determinant, eq. ([2]) can be immediately inverted, 

n 

Ai = VjU T = ^Vyl7/\ i = l,...,N. (4) 

j'=i 

This equation represents the algorithm for the reconstruction of the image Aj from its PCA com- 
ponent matrix, Vj, and the PCA eigenbasis U. In particular, any truncation of the summation on 
the left of eq. up to some index j = j max < n, will provide an approximate reconstruction of 
the original image, 

Jmax 

A, = J2 VijU? . (5) 

3=1 

If the image set is characterized by a low degree of entropy (that is, a high level of ordering 
- like for the image of an object, or, in our case, the Sun's line spectrum between two fixed 
wavelengths), the amplitudes of the singular values drop very fast - typically, by several orders of 
magnitude within the first few eigenfeatures. This implies that the information content of the data 
is practically confined within a set of eigenfeatures which is much smaller than the complete setjf] 
which, on the other hand, will also contain information on less significant and/or more random 
features of the images, including noise. 

Equation ([1]) shows that a perfectly constant background affecting all images Aj is completely 
removed from the expression of the covariance matrix C. In such an ideal case, the eigenvectors 
{E/j}j = i r .. )n constitute an optimal set of projection vectors for the relevant signal in the data, but 
not for the background. One should then expect to be able to reconstruct the signal to any degree of 
precision (including noise), while leaving the constant background out. This, of course, is never the 
case with real spectro-polarimetric data. Residual fringing is going to be present in the covariance 
matrix, and this ultimately affects the ability to reconstruct spectral signals that happen to be 
co- located with the fringes, and with comparable amplitudes. 

Note that the form of eq. ([T]) implies that the bidimensional dataset is summed (contracted) 
over the m rows of the differential images (Aj — A). Alternatively, we could have defined the 
covariance matrix as the average of the products (Aj — A)(Aj — A) T over the dataset, implying 
a contraction over the n columns of the differential images. In PCA applications to face recog- 
nition it typically is irrelevant whether the PCA covariance matrix is computed by contracting 
the bidimensional data over the X- or the Y-axis. For the analysis of Stokes maps, instead, the 
spectral dimension (conventionally identified in this work with the X-axis) represents a privileged 
coordinate. The reason is two-fold. First of all, it is essential for our problem that we preserve 
the distinction among the different spectral features in a Stokes map. In fact, because different 
spectral lines may be formed under very diverse atmospheric conditions, and have distinct thermal 
and magnetic diagnostic properties, it is important that their characteristics be kept distinct in 



3 Incidentally, this property provides a convenient means for data compression. 



-7- 



the PCA decomposition of the Stokes data. If we were instead to contract the data along the 
spectral dimension, the diagnostic information from all spectral lines would be merged together. 
Furthermore, preserving the spectral dimension in the PCA decomposition of a Stokes map al- 
lows us to isolate a specific subinterval of the spectral domain, if needed. The second argument 
is that polarized fringes tend to occur preferentially along the spectral dimension. Again, if we 
were to contract the data over wavelength, the spectral information of the polarized fringes would 
be completely merged with that from the spectral lines. The immediate consequence is that each 
of the PCA eigenfeatures would then always contain the information from both spectral lines and 
polarized fringes, making the filtering out of fringes impossible. All examples shown in the follow- 
ing discussion rely on the preservation of the spectral information of the Stokes map in the PCA 
decomposition. For this reason, we adopt the definition of the covariance matrix given by eq. JI]). 



3. Observations 



Spectro-polarimetric observations of an active region near the s olar limb were taken on Septem- 

hoiol) and the In- 



ber 22, 2011 with the Facility Infra-Red Spectro- polarimeter (FIR S; Jaeggl i et al 



terferometric BIdimensional Spectrometer (IBIS: lCavallinill2006l : iReardon &: Cavailinill2008l ). Both 
instruments are deployed at the Dunn Solar Telescope (DST) of the National Solar Observatory on 
Sacramento Peak (NSO/SP, Sunspot, NM). FIRS was used in a single-slit, dual-beam mode, with 
the slit oriented tangentially to the solar limb. The 75"-long projected slit was scanned across the 
solar image, in 70 steps of 0.65", to produce images in four polarimetrically modulated states, S{. 
The spectral range of the observations spanned from 1081.93 to 1085.01 nm. 



The data were reduced using software originally developed by S. Jaeggli (IJaegglil l201ll ) and 
modified by one of the authors (TS). The data reduction followed the standard procedures: 1) 
correction for non-linearities of the detector; 2) subtraction of dark frames; 3) division by flat 
fields; 4) co-registration of the two beams, including corrections for image rotation; 5) polarization 
calibration; 6) de-modulation of the signals Si to convert them into the corresponding Stokes 
parameters I, Q, U, and V. Special care was taken to acquire flat fields before and after the 
scans analyzed here, which were then linearly interpolated in time before being applied to the 
science data. This flat-field correction eliminated the dominant part of the signal contributed by 
the polarized fringes. However, there remained residual fringes and some other detector artifacts 
in the processed data. The method proposed here for the identification and removal of fringes has 
been applied to these reduced data. 



4. 2D PCA of Stokes spectra with polarized fringes 



As we anticipated in both Sects. Q] and [H the ability of PCA to isolate a background pattern 
within a dataset depends critically on how constant the appearance of the background pattern is 
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Fig. 2. — Example of PCA filtering of polarized fringes from a Stokes-t7 spectrum in the wavelength 
region of He I 1083 nm. The He I triplet spans approximately between X = 90 and X = 150 on the 
horizontal axis. The top panel shows the original data, the middle panel the PCA reconstructed 
data (using the first 14 PCA eigenfeatures) , and the bottom panel shows the difference of the two. 
We see how most of the large-scale fringes have been taken out in the PCA reconstruction, revealing 
more clearly "real" solar spectral features. 

throughout the dataset. In particular, to eliminate any unwanted artifact from the scientifically 
relevant data, it is important that the realizations of the relevant signals throughout the dataset 
be as varied as possible (and ideally vanish for a statistically significant number of frames), while 
the pattern of the artifact should remain practically constant. This allows PCA to "recognize" 
the signals and the background pattern as originating from uncorrelated sources, and ultimately 
to isolate the pattern into an "orthogonal" subspace with respect to the signals. This is the 
fundamental property enabling a PCA reconstruction of the scientific data that excludes the artifact. 

An application of the concepts presented so far is illustrated by Fig. [21 showing the Stokes- 
U spectrum in the wavelength region of He I 1083 nm, taken from the observations presented in 
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Fig. 3. — Same dataset as in Fig. but with the PCA filtering applied only to the spectral region 
encompassing He I 1083 nm. The avoidance of the strong signatures of the Si I line and the H2O 
atmospheric line allows for an efficient removal of the fringe pattern already with half the number 
of eigenfeatures as used for Fig. [2j The bottom panels show the Stokes profiles for the spatial point 
Y = 70. Star symbols show the data, the continuous line shows the reconstructed data, and the 
dotted line the difference of the two. 
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Sect. [3l This particular spatial scan encompasses a strongly magnetized active region with an 
overlying filament, as well as a portion of the quiet Sun. In these plots, the spectral dispersion 
(wavelength) is along the horizontal axis, and the spatial dimension is along the vertical axis. From 
left to right, we recognize the spectral features of the Si I line at 1082.7 nm, the He I triplet at 
1083 nm, an H2O telluric absorption line, and the Ca I doublet at 1083.3 nm (the leftmost signal is 
part of the Na I triplet at 1083.5 nm). 

The Si I line shows a dominant signal in all frames of the map, with an average central depth 
of about 0.5 in intensity, and polarization levels typically between 10 and 20%. The H2O telluric 
line also shows some polarization, at the level of 1% for linear polarization and a few times smaller 
for circular polarization. This must be an instrumental artifact, indicating a problem with the 
demodulation of the polarized signals into the Stokes yector, likely due to incorrect flat-fielding 



caused by the non-linearities of the FIRS IR detector (jJaegglil 120111 ). Because of this, the H2O 
line also appears in all the frames of the Stokes map. The polarized fringes are also visible as 
curved structures spanning the wavelength domain. For the example of Fig. [21 we performed a 
PCA decomposition of the Stokes-?/ map, consisting of 45 slit positions over the solar disk, and 
reconstructed the data for one particular slit position, using only the first 14 eigenfeatures out of the 
250 total (in our decomposition, the number of eigenfeatures equals the number of wavelength points 
in the scan; see Sect. [2]). We note how the signals are well captured by the PCA reconstruction, 
while the fringes are "confined" within the orthogonal subspace corresponding to eigenfeatures 
larger than #14, almost everywhere in the spectral domain of the map. The spectral ranges of the 
Si I and H2O lines are an exception, precisely because of their strong polarization signals appearing 
in all frames of the map, which prevents the PCA decomposition from separating these signals from 
the background fringes. In other words, at these particular wavelengths, PCA is unable to conclude 
that the spectral line signal and the fringes are uncorrelated. 

Figure [3] shows instead the results of the PCA decomposition applied to a restricted set of 
wavelengths, encompassing just the He I triplet. The three columns show the intensity-normalized 
Q, U, and V Stokes parameters. Like in the previous example, PCA decomposition is able to 
isolate the fringes, because of the much more diverse appearance of the He I polarization signals in 
the map. However, we also note how the fringe background can be isolated and removed already 
using a PCA reconstruction that takes into account only half of the eigenfeatures that were used 
for Fig. [21 This is not surprising, since the low-order PCA eigenfeatures are always dominated by 
the structures that produce the largest variance in the data, and thus the strong signals of the 
Si I line, as well as the sharp H2O line, occupy exclusively the first two eigenfeatures in the PCA 
decomposition of the full-range spectral maps depicted in Fig. [21 while the weakest signals of the 
He I triplet and the polarized fringes only show up in later orders. In the case of the spectral 
maps of Fig. [3l instead, the variance of the data is dominated by the He I signals, which therefore 
appear in the PCA decomposition already in the lowest order. In fact, increasing the number of 
eigenfeatures in the reconstruction of the examples of Figs. [2] and [3] does not improve the quality of 
the fringe removal, but rather the fringe pattern starts leaking back into the reconstructed image. 
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Fig. 4. — PC A reconstruction of two different Stokes-Q scan steps from the same dataset, showing 
how the fringe pattern can appear with distinct frequencies at different positions in the map. The 
PCA decomposition in this case is able to identify the two frequencies in the dataset, and subtract 
the proper combination of the respective contributions. 

Again, this is due to the imperfect subtraction of the fringe background in the creation of the 
covariance matrix of the data. 

In the datasets that we have analyzed from the FIRS instrument, the fringe patterns of a 
spectro-polarimetric map seem to occur always with one frequency and its first harmonic, as well as 
with various combinations of the two, although seemingly with nearly constant phases throughout 
the map. Apparently, PCA is able to identify the right combinations of these two basic fringe 
patterns and remove them from the spectral line signal (see Fig. |4]). As long as a statistically 
significant number of samples of the recurring frequencies and phases of the fringes are present in 
a given observation, without contamination from the data (i.e., with vanishing spectral signals), 
we can reasonably assume - and is demonstrated in practice - that the PCA decomposition will 
be able to identify those characteristics, and their contributions to any given frame of the map. 



- 12 - 




50 100 150 2DQ 



'Da ISO 2Q0 



\ jv 

~ 1 f 




50 100 150 200 
E.F. #6 


pi 


50 10 


150 
E.F. #9 


200 








E.F. #12 


1 




I j 






p 


50 100 150 200 
E.F. #15 




yysAi 


50 100 150 200 
E.F. #13 






My 



Fig. 5. — The first 18 basis eigenfeatures t7j for the Stokes-Q dataset from which the two scan steps 
of Fig. [5] were taken. 



On the other hand, if the frequency and/or phase of polarized fringes vary continually with the 
map step of a given dataset, then we can expect that the removal of fringes by PC A cannot be 
successfully accomplished within that individual dataset. 

We conclude this section by commenting on possible strategies to decide the order of truncation 
of the PCA reconstruction that is needed for a specific dataset, since ideally one would like to devise 
some type of automated procedure for fringe removal from Stokes data. 
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The data that we have analyzed in this work were acquired during observations that were 
not specifically prepared to optimize the removal of polarized fringes. Therefore, the information 
content of fringes varies greatly between different datasets. As we have demonstrated, this affects 
both the quality of the fringe removal, as well as the number of PCA eigenfeatures that must be 
retained in order to attain the best compromise between fringe removal and preservation of the 
original spectral line signals. Because of this, there is some degree of subjectivity in the truncation 
of the PCA eigenfeature expansion. We have already shown how clipping out strong signals from 
the spectral range that is to be analyzed in terms of principal components allows to retain a smaller 
number of expansion orders. 

On the other hand, in the data presented here, the spectral information of the Stokes line 
profiles is never completely "orthogonal" to that of the polarized fringes. In addition, the ampli- 
tude of the fringes is typically of the same order of magnitude as the Stokes signals of the He I 
triplet that we want to refine. Thus, many of the eigenfeatures that contribute to the fringe pat- 
tern across a dataset will also contain information on the line signals (such as velocity shifts and 
asymmetries), which may be critical for the science. Under these conditions, the truncation of the 
PCA reconstruction expansion still requires human oversight, unfortunately, and so can hardly be 
automated. 

Beside truncation of the PCA expansion in data reconstruction, a related issue is that of 
order selection, that is, the retaining or dropping of critical eigenfeatures that work in favor of 
the preservation of the line signal simultaneously with the removal of polarized fringes and other 
artifacts. A cursory look at the structure of the basis eigenfeatures U% and of the subimages V^Uf 
that contribute to a given scan step of the dataset (cf. eq. [I]) is often very helpful to determine 
which order contains relevant spectral information, and which one instead can be dropped in order 
to remove the fringe background without critical loss of science data. 

To illustrate this idea, Fig. [5] shows the first 18 eigenfeatures for the dataset from which the 
two scan steps of Fig. |4]were taken. We see at once some notable characteristics of this basis subset. 
The eigenfeatures #1 and #4 are completely dominated by the strong, sharp spectral signature of 
the H2O line, while their contribution to the He I spectral range is effectively zero. The polarized 
fringes start appearing already in the eigenfeature #7, overlapped with a strong antisymmetric 
(i.e., velocity-type) component of the Si I line. Evidently the eigenfeature #15 is also completely 
dominated by fringes in the He I region. The eigenfeatures #14 and #17 do not tend to zero at the 
boundary of the spectral range. Likely this is caused by a residual gradient over the detector's frame 
after flat-fielding, which could affect a proper collocation of the continuum level of the spectrum. 
With the exception of #7, the first eight eigenfeatures of Fig. [5] appear to be completely clear of 
the fringe background, only contributing to the line spectral signal that is relevant for science. All 
the successive eigenfeatures show instead some mixing of line spectra and fringes. Elimination of 
these eigenfeatures will inevitably determine some loss of science data, although this may occur at 
a level which lies below the target sensitivity of a particular observation. 
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Fig. 6. — Different PC A reconstructions of the same Stokes-Q scan step; left: using the full 
set of the first 14 eigenfeatures; right: reconstructing the data by dropping the subset {1,4,14} 
of eigenfeatures, and adding instead the subset {16, 18}. We note how the added eigenfeatures 
improve the reconstruction of the He I spectral data, while the quality of the fringe removal is 
preserved. 

Figure [6] illustrates an application of this strategy of order selection. The left side shows 
the same Stokes-Q scan step used also for Fig. H] (right side). The PC A reconstruction of the 
data implements the lowest 14 eigenfeatures in the PCA decomposition of the entire Stokes-Q 
map. This order of truncation does a good job at removing the polarized fringes, however it also 
appears to clip some of the finer spectral and spatial information in the He I triplet from the 
reconstructed data. This is noticeable, for example, in the appearance of the dark absorption 
feature located approximately at Y = 155, in the red component of He I 1083 nm (spanning 
approximately between X = 120 and X = 140). The reconstructed data shown on the right side of 
the figure is a modification of the PCA expansion used for the left side example, which was based 
on the characteristics of the eigenfeatures described above. The subset {1, 4, 14} was removed from 
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the truncated PCA expansion, while we added the subset {16, 18}. Such operation indeed helped 
reducing the H2O spectral signature from the scan. It was also able to capture more of the spectral 
and spatial complexity of the He I triplet, while at the same time preserving the quality of the 
removal of the polarized fringes. 



5. Conclusions 

The problem of the identication and removal of instrumental artifacts, such as polarized fringes, 
from spectro-polarimetric maps lends itself naturally to a treatment by pattern recognition methods, 
e. g., Principal Comp onent Analysis (PCA). In this paper, we have presented the 2D PCA algorithm 



of lYang et al.l (J200J), which appears to outperform other PCA strategies in the tests of the authors 



of that work. 



However, it is interes ting to comment briefly also on the more traditional implementation of 



PCA to face recognition (ITurk &; Pentlandlll99ll ). In this alternate approach, each m x n image 



matrix Aj is rearranged into a (m x n)-vector B{. The dataset of N images can then be represented 
by the (m x n) x N matrix B = {B%}i=i t ...,N > and the PCA covariance matrix is then built as 



N N 

iQB.-Bm-B), B = -Y,B l . (6) 



We note that C so defined is an N x N matrix, and that both the spatial and spectral dimensions of 
the original images Aj have been contracted in order to compute it. This a pproach has the notable 



advantage of producing eigenfeatures that resemble the original images (see lTurk &: Pentlandlll991 
for details), which provide a direct visual aid for the selection of the orders that isolate the spectral 
data from the fringe background. On the other hand, the high level of data compression tends 
to produce a much slower convergence of the PCA reconstruction series than for the 2D PCA 
algorithm. The alternative of using the products (Bj — B)(Bi — B) T for the definition of C (see 
also Sect. E|) would preserve both spatial and spectral information of the covariance of the original 
dataset. On the other hand, this would often create a matrix that is too big to be diagonalized 
efficiently for the typical values of the image dimensions, m and n. That is why the 2D PCA 
algorithm appears more suitable for our problem, despite the fact that the order selection may 
be more cumbersome with this approach. We defer a more detailed study of the potential and 
limitations of the traditional PCA approach to future work. 

The strategy to make PCA succeed in the removal of polarized fringes from spectro-polarimetric 
data is ideally to guarantee the presence in a dataset of widely diverse realizations of a line's 
polarization profiles over a practically time-independent fringe background. In fact, this determines 
the condition of non-correlation between the s pectral signa l and the polarized fringes, which is at 



the very basis of the working concept of PCA (|Jolliffeil2002l ). 



For the He I 1083 nm data presented in this work, this condition is often met, and as expected 
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PCA manages to separate rather well the spectral line information from the fringe background. The 
Si I line is always prominent in the solar intensity spectrum. However, this should not always be 
the case for polarization, e.g., for quiet-Sun observations. So one can hope that the same strategy 
will work with that line as well, at the condition that a sufficient diversity of polarization signals 
over the fringe background can be acquired during an observation. In the active-region data that 
we have analyzed, instead, the polarization signals of the Si I line are always typically at the level of 
10-20% for both linear and circular polarization, and thus the subtraction of the fringe background 
in that spectral range fails consistently. This problem is greatly mitigated by the fact that the 
typical amplitude of the fringes is relatively small (~ 0.2%) in our data, compared to the observed 
amplitudes of the Si I polarization, and so the polarization profiles of the line are not significantly 
affected by the fringe background. 

The results presented in this paper point in the direction of revising the way that the acquisition 
of science and calibration data should be planned for a typical spectro-polarimetric observing run. 
Looking at different targets on the quiet Sun, while maintaining the same configuration of the 
spectrograph in order to preserve the stationarity of the fringe background, could turn out to be a 
fundamental addition to all observing programs. Lamp flat fields may also be an important tool for 
"fringe calibration" , as they could be used to augment the set of fringe data de- voided of spectral 
line signals for the purpose of "training" the PCA in the identification of fringes. Of course, these 
lamp flats should ideally be taken under the same optical configuration of the spectrograph as used 
for a given observation, which may not always be possible. 

The authors enjoyed several stimulating discussions with A. Lopez Ariste (THeMIS-CNRS, 
France) and A. Asensio Ramos (IAC, Spain) on the problem of removal of polarized fringes by 
PCA and other methods. They are also grateful to HAO colleagues L. Kleint and R. Centeno 
Elliot, D. Elmore (NSO), and J. Stenflo (ETH, Switzerland), for helpful comments. 
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